
5 Genomics and Epigenomics Analysis
5.1 CNVs
Import CNVs
5.1.1 Filtering & QC
CNVs collapsed at the gene level available for 24776 genes
How many are all zeros??
not a single one with all zeros :/
cutoff density = 5 because overwelming abundances of 0s
There is obviously erinchment in CNVs in KICH when compared to KIRC and KIRP. KIRP has as well an higher number of CNVs than KIRP.
5.1.2 CNVs signatures across Kidney cancers
Followed guidelines and methodlogies explored by Alexandrov et al., 2022 and Steele et al., 2022
We generates novel signatures with SigProfiler
Need to reformat as::
The CNV signatures you have will be binned based on copy number and segment size. It looks like the data you have is in the format:
Copy Number>:
Copy number = The number of alleles in the CNV Label = Name for the copy number event: Homdel = Homozygous deletion. This is CNA=0 LOH = Loss of heterozygousity. One of the parental alleles has been lost. Het = Heterzygous. Both parental alleles are present. Segment size = Size of the CNA segment.
Hope that helps!
TODO:
- add genomic ranges to genes
- merge consecutive genes spans with same CNV
- plot them on CIRCOs

5.1.3 Dimesionality Reduction and Dataset Exploration
5.1.3.1 UMAP on filtered transcriptomics data
After we excluded biopsys from normal tissues and other tumors and we filtered out the lowly expressed genes, the UMAP shows three clusters that are better refined than the ones depicted in Section 1.2.1, and that roughly correspondes with three different kidney cancer subtypes.

5.1.3.2 PCA

The first 24 Principal Components capture more than 80% of the variance in the Kidney cancers transcriptomics dataset, with the first two components (PC1 and PC2) capturing a bit more than 25% of the variance.
When we project the samples in the PC1 and PC2, we can see that the PC1 separates KIRC from KICH adn KIRP, which instead cluster together. The second component PC2, instead, seems to partially separate KICH and KIRP samples.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resove the three cancer types. PC4 seems to separates better the KICH from KIRP, while PC1 can discriminate between KIRC and KIRP.


5.1.4 CNVs on Factor3 genes

5.1.4.1 Mutational signatures associated to Factor3

5.2 Variants
SOMATIC MUTATIONS
(https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com/download/mc3.v0.2.8.PUBLIC.nonsilentGene.xena.gz)
711 samples with somatic mutation info available for 40542 genes, 26958 without mutations, 13584 with mutation

5.2.1 Mutational signatures across Kidney cancers

WHAT IS IN CHROMOSOME 3????????
Using external_gene_name as id variables
5.2.2 Dimesionality Reduction and Dataset Exploration
5.2.2.1 UMAP on filtered transcriptomics data
After we excluded biopsys from normal tissues and other tumors and we filtered out the lowly expressed genes, the UMAP shows three clusters that are better refined than the ones depicted in Section 1.2.1, and that roughly correspondes with three different kidney cancer subtypes.

5.2.2.2 PCA

The first 24 Principal Components capture more than 80% of the variance in the Kidney cancers transcriptomics dataset, with the first two components (PC1 and PC2) capturing a bit more than 25% of the variance.
When we project the samples in the PC1 and PC2, we can see that the PC1 separates KIRC from KICH adn KIRP, which instead cluster together. The second component PC2, instead, seems to partially separate KICH and KIRP samples.
We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resove the three cancer types. PC4 seems to separates better the KICH from KIRP, while PC1 can discriminate between KIRC and KIRP.


5.2.3 Somatic Mutations on Factor3 genes

5.2.3.1 Mutational signatures associated to Factor3

5.3 Epigenomics
646 samples with methylation info 148068 whitelisted probes
Regardless of array type, both the 450k and EPIC record two measurements for each CpG: a methylated intensity (M) and an unmethylated intensity (U). Using these values, the proportion of methylation at each site CpG locus can be determined. The level of methylation at a locus is commonly reported as the Beta-value, i.e. the ratio of the methylated probe intensity and the overall intensity:
beta = M/(M/U)
Illumina recommends adding a constant offset α (by default, α = 100) to the denominator to regularize Beta value when both methylated and unmethylated probe intensities are low. The Beta-value statistic results in a number between 0 and 1, or 0 and 100%. Under ideal conditions, a value of zero indicates that all copies of the CpG site in the sample were completely unmethylated (no methylated molecules were measured) and a value of one indicates that every copy of the site was methylated.
5.3.1 Filtering & QC
removed 6364 probes with missing values
148068-141704 [1] 6364

5.3.2 Dimesionality Reduction and Dataset Exploration
5.3.2.1 UMAP on filtered transcriptomics data
After we excluded biopsys from normal tissues and other tumors and we filtered out the lowly expressed genes, the UMAP shows three clusters that are better refined than the ones depicted in Section 1.2.1, and that roughly correspondes with three different kidney cancer subtypes.

5.3.2.2 PCA

The first 24 Principal Components capture more than 80% of the variance in the Kidney cancers transcriptomics dataset, with the first two components (PC1 and PC2) capturing a bit more than 25% of the variance.
When we project the samples in the PC1 and PC2, we can see that the PC1 separates KIRC from KICH adn KIRP, which instead cluster together. The second component PC2, instead, seems to partially separate KICH and KIRP samples.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resove the three cancer types. PC4 seems to separates better the KICH from KIRP, while PC1 can discriminate between KIRC and KIRP.


cg03830585 –> ITPR1 protects renal cancer cells against natural killer cells by inducing autophagy
cg00868875 –> KCTD1
cg07037412 –> CACNA1H Voltage-gated calcium channels: Novel targets for cancer therapy CACNA1H was downregulated in gastrointestinal stromal tumor, sarcoma and renal cancer Notably, compared with our previous research, CACNA1H was specifically overexpressed relative to normal tissue samples in renal cancer, sarcoma and gastrointestinal stromal tumors.

5.3.3 Probe-Wise Differential Methylation

print(summary(dt)) KIRC_vs_KICH KIRP_vs_KICH KIRC_vs_KIRP Down 56952 43222 69811 NotSig 45814 51748 42083 Up 45302 53098 36174


Tables of differentially methylated probes.
ds
5.3.3.1 Functional Enrichments
Let’s see if any function is enriched in the genes of the Differentially Methilated Probes. Simply extract the methylated probes, compare agains the background ans see if the associated genes are diofferentialy methylated, and see the fucntions / pathways associated with them.
GO terms
KEGG pathways
GO terms
KEGG
s
ss
s
s
s
s
s
Let’s compare clusters
s
d
s s



x



s
x ssss
s
5.3.3.2 Aaggreagte probes into functional regions
Often, differential methylation of a single CpG is not so informative or can be hard to detect. Therefore, knowing whether several CpGs near to each other (or regions) are concordantly differentially methylated can be of greater interest.
There are several Bioconductor packages that have functions for identifying differentially methylated regions from 450k data. Some of the most popular are the dmrFind function in the charm package, which has been somewhat superseded for 450k arrays by the bumphunter function in minfi, and, the dmrcate in the DMRcate package. They are each based on different statistical methods, but we will be using dmrcate here, as it is based on limma and thus we can use the design and contrast matrix we defined earlier.
We will again start from our matrix of M-values. For this kind of analysis, this matrix has to be annotated with the chromosomal position of the CpGs and their gene annotations. Because in a first step the limma differential methylation analysis for single CpGs will be run again, we need to specify the design matrix, contrast matrix and contrast of interest.
Once we have the relevant statistics for the individual CpGs, we can then use the dmrcate function to combine them to identify differentially methylated regions. Of particular interest here is the lambda parameter; this value is the number of nucleotides that is allowed between significant CpGs before splitting them up in different regions. So a smaller lambda will result in more but smaller regions. For array data, the authors of the dmrcate package currently recommend a lambda of 1000. The main output table DMRs contains all of the regions found, along with their genomic annotations and p-values. To inspect this object and further visualization, you can best use the extractRanges function to create a GRanges object.
Differentially Methylated regions
KIRC_vs_KICH: 20346 KIRP_vs_KICH: 19243 KIRC_vs_KIRP: 20773
Just as for the single CpG analysis, it is a good idea to visually inspect the results to make sure they make sense. For this, use the DMR.plot function. By default, this plot draws the location of the DMR in the genome, the position of nearby genes, the positions of the CpG probes, the Beta value levels of each sample as a heatmap and the mean methylation levels for the various sample groups in the experiment.
5.3.3.3 Functional Regions mCSEA
An alternative approach to detect DMRs is to predefine the regions to be tested; so, as opposed to the previous approach where the regions are defined according to heuristic distance rules we can define regions based on a shared function. For this, we will used the package mCSEA which contains three types of regions for 450K and EPIC arrays: promoter regions, gene body and CpG Islands. mCSEA is based on Gene Set Enrichment analysis (GSEA), a popular methodology for functional analysis that was specifically designed to avoid some drawbacks in the field of gene expression. Briefly, CpG sites are ranked according to a metric (logFC, t-statistic, …) and an enrichment score (ES) is calculated for each region. This is done by running through the entire ranked CpG list, increasing the score when a CpG in the region is encountered and decreasing the score when the gene encountered is not in the region. A high ES indicates these probes are found high up in the ranked list. In other words, a high (N)ES value means that for the CpG sites in this region there is - on average - a shift towards a higher methylation level. This approach has been shown to be more effective to detect smaller but consistent methylation differences.
promoters
genes
85promoters, 150 genes
74 promoters 0 genes
promoters
genes
172 promoters, 50 genes
promoters
genes
cc
5.3.3.3.1 mCSEA Functional Enrichments
Let’s see if any function is enriched in the genes of the Differentially Methilated Probes. Simply extract the methylated probes, compare agains the background ans see if the associated genes are diofferentialy methylated, and see the fucntions / pathways associated with them.
d








c
5.4 Lessons Learnt
[[[[PROPER DESCRIPTION OF FINDINGS AND TRANSCRIPTOMICS ENRICHMENTS]]]]
So far, we have learnt:
- A